Supplementing change streams

ABSTRACT

Aspects of the present invention disclose a method, computer program product, and system for analyzing change stream data. The method includes identifying, by one or more computer processors, a list of changes in a replication stream. The method further includes determining, by one or more computer processors, if one or more changes in the list of changes matches a criteria for a hint. The method further includes in response to determining that one or more of the changes in the list of changes matches the criteria for a hint, inserting, by one or more computer processors, the hint into the list of changes prior to a first change of the one or more changes that triggers a match in criteria.

BACKGROUND OF THE INVENTION

The present invention relates generally to change streams, and moreparticularly to supplementing change streams.

Replication is the process of sharing database objects and data tomultiple databases. To maintain replicated database objects and data inmultiple databases, a change to one of these database objects at adatabase is shared with the other databases. In this way, the databaseobjects and data are kept synchronized at all of the databases in thereplication environment. In a replication environment, the databasewhere a change originates is called the source database, and a databasewhere a change is shared is called a target database.

SUMMARY

Aspects of the present invention disclose a method, computer programproduct, and system for analyzing change stream data. The methodincludes identifying, by one or more computer processors, a list ofchanges in a replication stream. The method further includesdetermining, by one or more computer processors, if one or more changesin the list of changes matches a criteria for a hint. The method furtherincludes in response to determining that one or more of the changes inthe list of changes matches the criteria for a hint, inserting, by oneor more computer processors, the hint into the list of changes prior toa first change of the one or more changes that triggers a match incriteria.

Another embodiment of the present invention discloses a method foranalyzing change stream data. The method includes identifying, by one ormore computer processors, a pattern of changes in a replication stream.The method further includes identifying, by one or more computerprocessors, a total number of changes in the pattern of change thatoccur on a first database. The method further includes determining, byone or more computer processors, if one or more changes on a seconddatabase, wherein the one or more changes on the second database is lessthan the total number of changes, can be implemented to create a sameoutcome on the second database as the outcome of the total number ofchanges on the first database. The method further includes in responseto determining that the one or more changes to the second database isless than the total number of changes to the first database and createsthe same outcome on the second database as the total number of changeson the first database, inserting, by one or more computer processors, ahint prior to a first change in the pattern of changes in thereplication stream indicating the determined one or more changes shouldbe implemented on the second database.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a functional block diagram illustrating a distributed dataprocessing environment, in accordance with one embodiment of the presentinvention;

FIG. 2 depicts a flowchart depicting operational steps of a program forsupplementing change streams for logical replication, executing withinthe computing system of FIG. 1, in accordance with one embodiment of thepresent invention; and

FIG. 3 depicts a block diagram of components of the server and/or thecomputing device of FIG. 1, in accordance with another embodiment of thepresent invention.

DETAILED DESCRIPTION

Embodiments of the present invention recognize that the ability to pushdata changes into a destination is frequently a bottleneck for theperformance of a change data replication solution. When a bottleneckdoes not allow for throughput that matches the rate of changes occurringon the source system, then the target can become latent, significantlyimpacting the business value of the target system.

Embodiments or the present invention recognize that currently a usermust select the best strategy for dealing with a bottleneck based on theuser's knowledge of the workload. The user may need to adjust thestrategy as the user's workload changes. Embodiments of the presentinvention recognize that the emergence of new high performance streaminganalytics engines create an opportunity to analyze a change streambefore the data reaches the apply phase so as to provide real timeguidance to the apply phase regarding the optimal strategy.

Implementation of embodiments of the invention may take a variety offorms, and exemplary implementation details are discussed subsequentlywith reference to the figures.

The present invention will now be described in detail with reference tothe figures. FIG. 1 is a functional block diagram of computing system100, in accordance with one embodiment of the present invention. FIG. 1provides only an illustration of one implementation and does not implyany limitations with regard to the environments in which differentembodiments may be implemented. Many modifications to the depictedenvironment may be made by those skilled in the art without departingfrom the scope of the invention as recited by the claims.

In the depicted environment, computing system 100 includes computingdevice 102, computing device 104, and computing device 106 connected tonetwork 112. Network 112 may be a local area network (LAN), a wide areanetwork (WAN), such as the Internet, a cellular data network, anycombination thereof, or any combination of connections and protocolsthat will support communications between computing device, in accordancewith embodiments of the invention. Network 112 may include wired,wireless, or fiber optic connections. Network 112 includes one or morewired and/or wireless networks that are capable of receiving andtransmitting data, voice, and/or video signals, including multimediasignals that include voice, data, and video information. Computingsystem 100 may have other devices not shown that are able to communicatewith computing device 102, computing device 104, and computing device106 via network 112.

Computing device 102 may be any computing device, such as a managementserver, a web server, a desktop computer, a laptop computer, a netbookcomputer, a smart phone, or a tablet computer. In general, computingdevice 102 may be any electronic device or computing system capable ofprocessing program instructions, for sending and receiving data withnetwork 112. In other embodiments, computing device 102 may represent aserver computing system utilizing multiple computers as a server system,such as in a cloud computing environment. In the depicted embodiment,computing device 102 contains capture program 110 and source database140. In some embodiments, computing device 102 may include additionalprograms, databases, or interfaces which are not depicted. In someembodiments, computing device 102 may be connected to multiple differentcomputing device (not depicted) that send change instructions tocomputing device 102 indicating changes to be made on source database140. For example, computing device 102 is connected to multiple othercomputing devices via network 112, some of which change source database140. Computing device 102 is depicted and described in further detailwith respect to FIG. 3.

In depicted computing system 100, capture program 110 resides oncomputing device 102. In some embodiments, capture program 110 mayreside on another computing device, but oversees changes to sourcedatabase 140 via network 112. In some embodiments, capture program 110logs changes to a source database (e.g., source database 140) as changesoccur. For example, capture program 110 creates a log, also known as aqueue, of each change to source database 140. In various examples,capture program 110 may capture changes to source database 140 in realtime, as the request for changes are received, as the changes areperformed, etc. In some embodiments, capture program 110 may sendchanges to a target database (e.g., target database 160). In an example,program 110 receives multiple changes to source database 140 frommultiple different computing devices (not depicted), implements thechanges to source database 140, creates a list of changes which haveoccurred, and then sends the changes to computing device 106 to beapplied to target database 160. In other embodiments, computing device102 may have other programs for receiving change requests, performingchanges to source database 140, and/or sending changes to computingdevice 106. In some embodiments, changes for source database 140 arebeing received from multiple other computing device (not depicted) bycomputing device 102 faster than computing device 102 can forward thosechange request to computing device 106, but capture program 110 logs allchanges to source database 140 as the changes occur.

Source database 140 may be a repository that may be written to and/orread by capture program 110, and analysis program 120. In an embodiment,local database 140 is an organized collection of data. In someembodiments, source database 140 is a remote database for multiple usersof multiple computing devices, all sharing and using the same storedinformation. In some embodiments, source database 140 may contain tablesof information which are in constant flux as users of other computingdevices (not depicted) change various items and entries stored in sourcedatabase 140. In some examples, a user of another computing device (notdepicted) may change a single entry, a row, a column, multiple rows, anentire table, etc. In various embodiments, other programs (not depicted)or other computing devices (not depicted) may store or changeinformation on source database 140. In various embodiments, captureprogram 110 may identify changes that occur in source database 140 andlog the changes. In some embodiments, analysis program 120 may alsoidentify changes that occur in source database. In other embodiments,source database 140 may reside on a server, another computing device(not depicted), or independently as a standalone database that iscapable of communicating with computing device 102, computing device104, and computing device 106 via network 112.

Computing device 104 may be any computing device, such as a managementserver, a web server, a desktop computer, a laptop computer, a netbookcomputer, a smart phone, or a tablet computer. In general, computingdevice 104 may be any electronic device or computing system capable ofprocessing program instructions, for sending and receiving data withnetwork 112. In other embodiments, computing device 104 may represent aserver computing system utilizing multiple computers as a server system,such as in a cloud computing environment. In the depicted embodiment,computing device 104 contains analysis program 120 and hint database150. In some embodiments, computing device 104 may include additionalprograms, databases, or interfaces which are not depicted. In someembodiments, computing device 104 may be connected directly to computingdevice 102 prior to computing device 102 connecting to network 112. Inother embodiments, computing device 104 is connected to computing device102 via network 112. Computing device 104 is depicted and described infurther detail with respect to FIG. 3.

In depicted computing system 100, analysis program 120 resides oncomputing device 104. In some embodiments, analysis program 120 mayreside on another computing device, but monitors the change log createdby capture program 110 via network 112. In various embodiments, analysisprogram 120 is monitoring a log of changes created by capture program110 and identifying possible hints that match occurrences with the logcreated by capture program 110. In some embodiments, analysis program120 identifies multiple occurrences that match hints stored in hintdatabase 150, and then inserts hints into the log created by captureprogram 110 to be implemented on target database 160. In someembodiments, analysis program 120 may have hundreds or even thousands ofchanges to monitor, and analysis program 120 insert hints into thecreated log or queue after the changes have already been implemented ontarget database 160.

In some embodiments, analysis program 120 resides beside a change stream(e.g., a log of changes to a database that is sent to another databaseto replicate the same changes also known as a replication stream),acting as a multiple reader of the change stream. If analysis program120 determines a hint may be useful, analysis program 120 inserts thehint into the stream at the appropriate point (e.g., before the changesthat are impacted/reflected by the hint). In some embodiments, applyprogram 130 may not wait for a hint, and therefore, analysis program 120may insert hints that provide no benefit because apply program 130 hasalready read past the point of the inserted hint in the change stream.

Hint database 150 may be a repository that may be written to and/or readby analysis program 120. In an embodiment, hint database 150 is anorganized collection of data. In some embodiments, hint database 150 maycontain hints, which include procedures to be utilized by a program(e.g., apply program 130) applying changes to a target database (e.g.,target database 160) to reduce the amount of changes to implement on atarget database from a source database, but still resulting in the sameend product. For example, analysis program 120 reviews a log or queue ofchanges to source database 140 and references hint database 150 todetermine if any changes or sequences of changes matches any protocolsstored in hint database 150. In some embodiments, hint database 150 mayinclude multiple possible sequences that equate to multiple possiblehints. In some embodiments, hint database 150 may reside on a server,another computing device (not depicted), or independently as astandalone database that is capable of communicating with computingdevice 102, computing device 104, and computing device 106 via network112.

Some example hints may include a command (e.g., to delete an entiretable) based upon a sequence identified in an incoming data stream. Oneexample of a hint may include transaction dependencies. For example,information regarding previous transactions, which have datadependencies on others transaction (e.g. change the same row of atable), allow an apply program to create parallel streams of independenttransactions that can be applied without synchronization and with norisk of row deadlocks. In another example, a hint may be a simple batchtransaction. For example, batch jobs often create large transactionswith a single type of operation (e.g. 10,000,000 inserts caused by asingle INSERT AS SELECT statement). Such transactions can be applied inparallel. Another example hint may include table reorganizations. Forexample, some transactions will not actually result in changes to thetable data, but instead record internal operations that were done toreorganize data in the source table to improve efficiency. In someexamples, the capture program is able to identify the reorganizationtransactions but not universally. A hint may include ignoring thetransaction. Yet another example hint may include relative table load.For example, the load across several tables will change over time. Anapply program can often optimize performance based on tying a giventable to a given apply thread, which works best when each apply threadwill get an equivalent amount of work. Analysis program 120 may includemultiple hints that allow the apply program to spread the tables acrossthe threads to get an even balance.

Computing device 106 may be any computing device, such as a managementserver, a web server, a desktop computer, a laptop computer, a netbookcomputer, a smart phone, or a tablet computer. In general, computingdevice 106 may be any electronic device or computing system capable ofprocessing program instructions, for sending and receiving data withnetwork 112. In other embodiments, computing device 106 may represent aserver computing system utilizing multiple computers as a server system,such as in a cloud computing environment. In the depicted embodiment,computing device 106 contains apply program 130 and target database 160.In some embodiments, computing device 106 may include additionalprograms, databases, or interfaces which are not depicted. In someembodiments, computing device 106 is connected to computing device 102via network 112. Computing device 106 is depicted and described infurther detail with respect to FIG. 3.

In depicted computing system 100, apply program 130 resides on computingdevice 106. In some embodiments, apply program 130 may reside on anothercomputing device, but applies changes in a source database (e.g., sourcedatabase 140) to a target database (e.g., target database 160). In someembodiments, apply program 130 may monitor changes to a source database,and apply the changes to a target database. In other embodiments, applyprogram 130 may receive a queue of changes previous captured by anotherprogram (e.g., capture program 110) and apply the received changes to atarget database. In yet other embodiments, apply program 130 may queryanother program for a log of changes to a source database and then applythe same changes to a target database. In various embodiments, applyprogram 130 may lag behind a capture program, as the apply program hasto adjust the target database each individual change at a time, asopposed to block changes which may be applied to a source database.

In an example, apply program 130 is currently maintaining fourconnections into the target database and some uncommitted changesapplied already in each of these connections. Apply program 130 iscurrently running in a mode using hash partitioning on key values todistribute rows across the connections. Apply program 130 thenencounters a hint indicating the following set of N operations was theresult of a single update statement. Apply program 130 may have aheuristic algorithm that determined N was very large (e.g. 1,000,000),and then the right strategy could take advantage of the hint about thesingle update statement. Apply program 130 may first need to commit allthe 4 connections to target database 160 (e.g., as the update statementmight affect rows that had been already touched in each or any of theconnections), and then use one of the connections to run the updatestatement. In an example of a small N (e.g., 100), apply program 130 isprobably more efficient avoiding the extra commit, and continuing thestrategy of fanning out the 100 operations against the 4 connections,thereby ignoring the hint. In various examples, apply program 130 doesnot need to apply the hints. In an example, several hints are availableand apply program may choose between the various hints.

Target database 160 may be a repository that may be written to and/orread by apply program 120. In an embodiment, target database 160 is anorganized collection of data. In some embodiments, target database 160is a remote database for backup purposes of a source database. In otherembodiments, target database 160 may be a different type of databaseutilized for logical replication. In some embodiments, target database160 may contain tables of information, which may change as changes aremade to a source database. In some embodiments, target database 160 maynot be an exact replica of a source database. For example, targetdatabase 160 is a target database for multiple source databases. Inanother example, target database may lag in replicating changes becauseapply program 130 is not keeping pace with changes made to sourcedatabase 140. In other embodiments, target database 160 may reside on aserver, another computing device (not depicted), or independently as astandalone database that is capable of communicating with computingdevice 102, computing device 104, and computing device 106 via network112.

FIG. 2 is a flowchart depicting operational steps of program 200, whichis a function of analysis program 120, in accordance with an embodimentof the present invention. In some embodiments, the operational steps ofprogram 200 begin at the prompt of an administrator. In otherembodiments, the operational steps of program 200 begin in response to abacklog amount of time between updates to a target database and changeswhich occurred to a source database. In yet other embodiments, program200 is constantly running.

Program 200 identifies a list of hints (step 202). In variousembodiments, program 200 may receive multiple hints from anadministrator to store in a database (e.g., hint database 150). In otherembodiments, program 200 may receive hints from another computer orprogram (e.g., a cognitive program that creates hints based uponanalysis of changes to a database). In yet other embodiments, program200 may not store or receive hints, but rather identify hints located ina database (e.g., hint database 150). In some embodiments, hints may bestored on multiple computing devices, and program 200 identifies eachpossible hint to be applied. In some example, program 200 may search fora flag or label which identifies a hint. For example, all hint may startwith the same word, sequence, or label.

In some embodiments, administrators may configure hint priorities. Forexample, if a hint is indicated as important and that transactionalatomicity must be maintained, then hints about transaction dependencywould be very important. In the example, the hint may have a prioritynumber, and a threshold number must be reached to include a hint. Inanother example, if a hint is indicating maximum performance is moreimportant, even if small windows may exist where transactional atomicityoccurs, program 200 may ignore the possible hint.

Program 200 identifies the criteria for each hint (step 204). In variousembodiments, program 200 identifies the criteria of each of theidentified hints from step 204. For example, program 200 identifies thata hint states, “delete row table”, which is triggered by multiplesequential deletes in the same row of a source database. In someembodiments, a single action may identify a specific hint. For example,a change stream only contains a specific entry if a specific hint is toapply. For the purposes of this application the words entry, operation,or action can be used interchangeably. In another embodiment, a singleentry in a change stream may invoke the possibility of multiple hints.In one example, the criteria may be the last N rows (e.g., 15 rows inthe stream) have been and Insert operation for the same table. Inanother example, criteria may be the latest transaction makes changes onrows that were previously changed in transactions x and y.

Program 200 identifies a queue of changes to a database (step 206). Insome embodiments, program 200 identifies a change stream from a sourcedatabase to a target database. For example, program 200 identifies acontinuous stream of changes entered on source database 140 captured bycapture program 110 and sent to apply program 130. In other embodiments,program 200 identifies various chunks of data which have been changed ona source database. In various embodiments, program 200 identifies chunksof changes in a change stream. In other embodiments, program 200continuously monitors a change log or queue of capture program as thechanges occur. In some embodiments, program 200 may search for thelargest set of changes that match the pattern. As an example, a patternmight be a series of Delete operations on the same table, and program200 is identifying a change that isn't a Delete for that table. Program200 can group all the operations together up until just before thedifferent one which creates the biggest contiguous set of Deletes.

Program 200 determines if a change in the queue matches a criterion fora hint (decision 208). In various embodiments, program 200 continuouslymonitor a change stream searching for criteria that trigger a hint. Insome embodiments, program 200 may have preset priorities for hints. Forexample, a user may select hints which are more common or save moretime, and program 200 may search for criteria which triggers the highpriority hints first. In some embodiments, program 200 searches throughthe change stream from step 206 and takes note of whether the currentchange could be part of a chunk of changes that could benefit from ahint. In an example, whenever program 200 identifies a second operationthat is of the same type (Insert, Update or Delete) and for the sametable (e.g. Insert on Table1) then program 200 is beginning to see achunk that could potentially be identified as part of a simple batchtransaction or the result of a single original operation. When program200 identifies an operation of a different type or for a different tablethen program 200 identifies that the chunk would now be complete. Onceprogram 200 has a complete chunk then program 200 will analyze thatchunk to determine if the chunk matches a criterion for a hint.

In some embodiments, program 200 determine if a change matches criteriafor a hint based upon machine learning. For example, program 200 mayrecognize the same changes repeating and determine which hints may bevalid based upon user feedback. In another embodiment, program 200 maytrack the amount of lag time between an apply program updating a targetdatabase and changes occurring to a source database. Program 200 maydetermine certain hints reduce the lag, and insert the determined hintsmore frequently. In other embodiments, program 200 may have cognitivecomputing abilities, which can recognize patterns as originating from asingle action. In other embodiments, program 200 may work in conjunctionwith another computing device that has cognitive computing abilitiesthat can recognize and determine patterns and determine matches topossible hints.

In some embodiments, program 200 may identify repeatable access patternsin the change stream. If program 200 identifies repeatable accesspatterns, program 200 may send a notification to an administrator toreview or store the information in hint database 150 as a possibletrigger for a hint. In other embodiments, program 200 may not send therepeatable access patterns to an administrator but rather store them asa trigger for a hint. For instance, if a distinct pattern always startswith a fixed set of Update/Insert/Delete (UID) operations to the sameset of tables, program 200 may determine the earliest possiblesubsequence. The hint can be stored which would enable the hint to beapplied immediately. In an example where a match is a false positive,the false positive is detected and result in some re-application (orforward recovery) operations may be stored in hint database 150. In someembodiments, program 200 may consider many candidate hints at any time,(e.g., program 200 keeping track of a specific chunk of triggeringactions independently from other candidate hints triggering actions).

In some embodiments, program 200 may determine matching based onmatching a pattern involving a number of changes (i.e., entries in thechange stream). For example, with the addition of the next change fromthe change stream, are there any patterns of changes that satisfy thecriteria for a hint. In some embodiments, criteria for a hint may be apattern of changes in a change stream. For example, a pattern of changesmay be 15 rows deleted from a table consisting of 15 rows. Program 200may determine that the deletion of 15 rows means the entire table hasbeen deleted. Program 200 may determine that if a hint is added prior tothe first change (e.g., deletion of row 1) which indicates deleting theentire table, an apply program may not have to take as many actions toaccomplish the same goal. In this example, program 200 may insert a hintindicating an apply program should delete an entire table, and the hintis input in the change stream prior to the first delete action in thepattern of changes. If program 200 determines that a match in thecriteria for a hint does not exist (no branch, decision 208), thenprogram 200 returns to step 206.

If program 200 determines that a change in the queue matches a criteriafor a hint (yes branch, decision 208), then program 200 inserts thematching hint in front of the initial triggering change (step 210). Invarious embodiments, program 200 inserts hints into a change streamprior an apply program passing the point of the hint. In one embodiment,program 200 adds additional hint information into the copy of the streamthat program 200 publishes as output. In the embodiment, program 200ensures that program 200 only performs analysis and delays moving datafrom the input to the output when a sufficient amount of data alreadyexists in the output stream that an apply program has not read so as toensure that the apply program is not starved.

In various embodiments, after program 200 determines a match exists,program 200 inserts the matching hint into the stream at the appropriatepoint (i.e., the point prior to the first triggering action of thehint). In some embodiments, an apply program may not wait for a hint,therefore, program 200 may insert hints that provide no benefit becausethe apply program has already read past that point in the change stream.

In some embodiments, a section of actions may have several hints, all ofwhich are relevant, that an apply component might choose from. As anexample, there may be a chunk of operations which are hinted as being abatch update on Table1, and program 200 provides a hint as the exactsource SQL statement that might have been run to cause all thoseupdates. In some embodiments, program 200 may flag certain hints ashigher priority based upon user setting or past performance.

FIG. 3 depicts computer system 300, which is an example of a system thatincludes components of computing device 102, computing device 104,and/or computing device 106. Computer system 300 includes processor(s)304, cache 316, memory 306, persistent storage 308, communications unit310, input/output (I/O) interface(s) 312, and communications fabric 302.Communications fabric 302 provides communications between cache 316,memory 306, persistent storage 308, communications unit 310, andinput/output (I/O) interface(s) 312. Communications fabric 302 can beimplemented with any architecture designed for passing data and/orcontrol information between processors (such as microprocessors,communications and network processors, etc.), system memory, peripheraldevices, and any other hardware components within a system. For example,communications fabric 404 can be implemented with one or more buses or acrossbar switch.

Memory 306 and persistent storage 308 are computer readable storagemedia. In this embodiment, memory 306 includes random access memory(RAM). In general, memory 306 can include any suitable volatile ornon-volatile computer readable storage media. Cache 316 is a fast memorythat enhances the performance of processor(s) 304 by holding recentlyaccessed data, and data near recently accessed data, from memory 306.

Program instructions and data used to practice embodiments of thepresent invention may be stored in persistent storage 308 and in memory306 for execution by one or more of the respective processor(s) 304 viacache 316. In an embodiment, persistent storage 308 includes a magnetichard disk drive. Alternatively, or in addition to a magnetic hard diskdrive, persistent storage 308 can include a solid-state hard drive, asemiconductor storage device, a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM), a flash memory, or any othercomputer readable storage media that is capable of storing programinstructions or digital information.

The media used by persistent storage 308 may also be removable. Forexample, a removable hard drive may be used for persistent storage 305.Other examples include optical and magnetic disks, thumb drives, andsmart cards that are inserted into a drive for transfer onto anothercomputer readable storage medium that is also part of persistent storage308.

Communications unit 310, in these examples, provides for communicationswith other data processing systems or devices. In these examples,communications unit 310 includes one or more network interface cards.Communications unit 310 may provide communications through the use ofeither or both physical and wireless communications links. Programinstructions and data used to practice embodiments of the presentinvention may be downloaded to persistent storage 308 throughcommunications unit 310.

I/O interface(s) 312 allows for input and output of data with otherdevices that may be connected to each computer system. For example, I/Ointerface(s) 312 may provide a connection to external device(s) 318,such as a keyboard, a keypad, and/or some other suitable input device.External device(s) 318 can also include portable computer readablestorage media such as, for example, thumb drives, portable optical ormagnetic disks, and memory cards. Software and data used to practiceembodiments of the present invention, e.g., analysis program 120, sourcedatabase 140, hint database 150 target database 160, and capture program110 can be stored on such portable computer readable storage media andcan be loaded onto persistent storage 308 of computing device 104 viaI/O interface(s) 312 of computing device 104. Software and data 323 usedto practice embodiments of the present invention, e.g., analysis program120, can be stored on such portable computer readable storage media andcan be loaded onto persistent storage 308 of computing device 104 viaI/O interface(s) 312 of computing device 104. I/O interface(s) 312 alsoconnect to display 320.

Display 320 provides a mechanism to display data to a user and may be,for example, a computer monitor.

The programs described herein are identified based upon the applicationfor which they are implemented in a specific embodiment of theinvention. However, it should be appreciated that any particular programnomenclature herein is used merely for convenience, and thus theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

The present invention may be a system, a method, and/or a computerprogram product at any possible technical detail level of integration.The computer program product may include a computer readable storagemedium (or media) having computer readable program instructions thereonfor causing a processor to carry out aspects of the present invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, configuration data for integrated circuitry, oreither source code or object code written in any combination of one ormore programming languages, including an object oriented programminglanguage such as Smalltalk, C++, or the like, and procedural programminglanguages, such as the “C” programming language or similar programminglanguages. The computer readable program instructions may executeentirely on the user's computer, partly on the user's computer, as astand-alone software package, partly on the user's computer and partlyon a remote computer or entirely on the remote computer or server. Inthe latter scenario, the remote computer may be connected to the user'scomputer through any type of network, including a local area network(LAN) or a wide area network (WAN), or the connection may be made to anexternal computer (for example, through the Internet using an InternetService Provider). In some embodiments, electronic circuitry including,for example, programmable logic circuitry, field-programmable gatearrays (FPGA), or programmable logic arrays (PLA) may execute thecomputer readable program instructions by utilizing state information ofthe computer readable program instructions to personalize the electroniccircuitry, in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the blocks may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The descriptions of the various embodiments of the present inventionhave been presented for purposes of illustration, but are not intendedto be exhaustive or limited to the embodiments disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the invention.The terminology used herein was chosen to best explain the principles ofthe embodiment, the practical application or technical improvement overtechnologies found in the marketplace, or to enable others of ordinaryskill in the art to understand the embodiments disclosed herein.

What is claimed is: 1-9. (canceled)
 10. A computer program product foranalyzing change stream data, the computer program product comprising:one or more computer readable storage media and program instructionsstored on the one or more computer readable storage media, the programinstructions comprising: program instructions to identify a list ofchanges in a replication stream; program instructions to determine ifone or more changes in the list of changes matches a criteria for ahint; and in response to determining that one or more of the changes inthe list of changes matches the criteria for a hint, programinstructions to insert the hint into the list of changes prior to afirst change of the one or more changes that triggers a match incriteria.
 11. The computer program product of claim 10, wherein programinstructions to identify a list of changes in a replication streamcomprises program instructions to: identify a list of changes occurringon a first database, wherein the list of changes in the first databaseare sent to a second database.
 12. The computer program product of claim10, wherein program instructions to determine if one or more changes inthe list of changes matches a criteria for a hint comprises programinstructions to: identify criteria of one or more preset hints; reviewthe list of changes in a replication stream; and determine if theidentified criteria of the one or more preset hints matches the reviewedlist of changes in a replication stream.
 13. The computer programproduct of claim 10, wherein criteria for a hint comprises one or morespecific changes in the list of changes that occur in a specificsequence.
 14. The computer program product of claim 11, wherein a hintcomprises an instruction for reducing the amount of changes needed in asecond database to match changes made to a first database.
 15. Thecomputer program product of claim 12, wherein program instructions toidentify criteria of one or more preset hints comprises programinstructions to: identify a pattern of changes in the identified list ofchanges; and determine if the pattern of changes matches the criteriafor one or more preset hints.
 16. The computer program product of claim10, wherein a list of changes in a replication stream further comprisesprogram instructions to: identify a first sequence of changes in thelist of changes; determine if the total number of changes in thesequence of changes in the list of changes could be reduced in a newsequence of changes by fewer than the total number of changes to resultin the same outcome of changes; and create a new hint that matches thenew sequence of changes.
 17. The computer program product of claim 16,wherein, the created hint comprises a new sequence of changes and acriteria for the created new hint is the first sequence of changes inthe list of changes.
 18. A computer system for analyzing change streamdata, the computer system comprising: one or more computer processors;one or more computer readable storage media; program instructions storedon the computer readable storage media for execution by at least one ofthe one or more computer processors, the program instructionscomprising: program instructions to identify a list of changes in areplication stream; program instructions to determine if one or morechanges in the list of changes matches a criteria for a hint; and inresponse to determining that one or more of the changes in the list ofchanges matches the criteria for a hint, program instructions to insertthe hint into the list of changes prior to a first change of the one ormore changes that triggers a match in criteria.
 19. The computer systemof claim 18, wherein program instructions to identify a list of changesin a replication stream comprises program instructions to: identify alist of changes occurring on a first database, wherein the list ofchanges in the first database are sent to a second database.
 20. Thecomputer system of claim 18, wherein program instructions to determineif one or more changes in the list of changes matches a criteria for ahint comprises program instructions to: identify criteria of one or morepreset hints; review the list of changes in a replication stream; anddetermine if the identified criteria of the one or more preset hintsmatches the reviewed list of changes in a replication stream.
 21. Thecomputer system of claim 18, wherein criteria for a hint comprises oneor more specific changes in the list of changes that occur in a specificsequence.
 22. The computer system of claim 19, wherein a hint comprisesan instruction for reducing the amount of changes needed in a seconddatabase to match changes made to a first database.
 23. The computersystem of claim 20, wherein program instructions to identify criteria ofone or more preset hints comprises program instructions to: identify apattern of changes in the identified list of changes; and determine ifthe pattern of changes matches the criteria for one or more presethints.
 24. The computer system of claim 18, wherein a list of changes ina replication stream further comprises program instructions to: identifya first sequence of changes in the list of changes; determine if thetotal number of changes in the sequence of changes in the list ofchanges could be reduced in a new sequence of changes by fewer than thetotal number of changes to result in the same outcome of changes; andcreate a new hint that matches the new sequence of changes.
 25. Thecomputer system of claim 24, wherein, the created hint comprises a newsequence of changes and a criteria for the created new hint is the firstsequence of changes in the list of changes.